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(quer$3 near4 (classif$5 cluster$3)) and (context$1 near3 
vector$1) 



(quer$3 near3 context$1 ) same (context$1 near3 vector$1) 



(quer$3 near3 classif$6) and (context$1 near3 vector$3) 



US-6502091-B1.DtD. and 6502091. PN. and (6502091. PN. 
and (6502091. pn. and (user$4 same context$4))) 



(((analyz$3 pars$3) near5 quer$3 ) and ((histor$3 log$3) 
near5 quer$3)) and ((classif$4 cluste4$3) near5 quer$3) 



6502091. PN. and (6502091. PN. and (6502091. pn. and 
(user$4 same context$4))) and ((quer$3 near3 context$1 ) 
same (context$1 near3 vector$1)) 



6502091. PN. and (6502091. PN. and (6502091 .pn. and 
(user$4 same context$4))) and ((((analy2$3 pars$3) near5 
quer$3 ) and ((histor$3 log$3) near5 quer$3)) and ((classif$4 
cluste4$3) near5 quer$3)) 

((user$3 near5 interact$4 near5 (stat$3 data information)) 
(user$4 near5 (histor$4 log$4))) same (user$4 near5 
context$4 near5 (vector$4 classif$4 cluster$4)) 



(6502091. PN. and (6502091. pn. and (user$4 same 
context$4))) and ((((analyz$3 pars$3) near5 quer$3 ) and 
((histor$3 log$3) near5 quer$3)) and ((classif$4 cluste4$3) 
near5 quer$3)) 

((((analyz$3 pars$3) near5 quer$3 ) and ((histor$3 log$3) 
near5 quer$3)) and ((classif$4 cluste4$3) near5 quer$3)) and 
(6502091. PN. and (6502091. PN. and (6502091 .pn. and 
(user$4 same context$4))) and ((quer$3 near3 context$1 ) 
same (context$1 near3 vector$1))) 
(6502091. PN. and (6502091. PN. and (6502091. pn. and 
(user$4 same context$4))) and ((quer$3 near3 context$1 ) 
same (context$1 near3 vector$1))) and ((((analyz$3 pars$3) 
near5 quer$3 ) and ((histor$3 log$3) near5 quer$3)) and 
((classif$4 cluste4$3) near5 quer$3)) and (((user$3 nearS 
interact$4 nearS (stat$3 data information)) (user$4 near5 
(histor$4 log$4))) same (user$4 near5 context$4 near5 
(vector$4 classif$4 cluster$4))) 

((((analyz$3 pars$3) near5 quer$3 ) and ((histor$3 log$3) 
near5 quer$3)) and ((classif$4 cluste4$3) near5 quer$3)) and 
(((user$3 nearS interact$4 near5 (stat$3 data information)) 
(user$4 near5 (histor$4 iog$4))) same (user$4 near5 
context$4 near5 (vector$4 classif$4 cluster$4))) and ( 
(6502091 .PN. and (6502091. pn. and (user$4 same 
context$4))) and ((((analyz$3 pars$3) near5 quer$3 ) and 
((histor$3 log$3) near5 quer$3)) and ((classif$4 cluste4$3) 
nearS quer$3))) 
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16 


0 


(((user$3 near5 interact$4 nearS (stat$3 data information)) 
(user$4 near5 (liistor$4 log$4))) same (user$4 near5 
context$4 near5 (vector$4 classif$4 cluster$4))) and ( 
(6502091. PN. and (6502091. pn. and (user$4 same 
context$4))) and ((((analy2$3 pars$3) near5 quer$3 ) and 
((histor$3 log$3) near5 quer$3)) and ((classif$4 cluste4$3) 
nearS quer$3))) and ( ((((analyz$3 pars$3) nearS quer$3 ) 
and ((histor$3 log$3) nearS quer$3)) and ((classif$4 
cluste4$3) nearS quer$3)) and (6502091. PN. and 
(6502091. PN. and (6502091. pn. and (user$4 same 
context$4))) and ((quer$3 near3 context$1 ) same (context$1 
near3 vector$1)))) and ( (6502091. PN. and (6502091. PN. and 
(6502091. pn. and (user$4 same context$4))) and ((quer$3 
near3 context$1 ) same (context$1 nearS vector$1))) and 
((((analyz$3 pars$3) nearS quer$3 ) and ((histor$3 log$3) 
near5 quer$3)) and ((classlf$4 cluste4$3) nearS quer$3)) and 
(((user$3 near5 interact$4 near5 (stat$3 data information)) 
(user$4 near5 (histor$4 log$4))) same (user$4 nearS 
context$4 nearS (vector$4 classlf$4 cluster$4))) ) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


2004/07/28 14:56 


17 


0 


( (6502091. PN. and (6502091 .pn. and (user$4 same 
context$4))) and ((((analyz$3 pars$3) near5 quer$3 ) and 
((liistor$3 log$3) near5 quer$3)) and ((classif$4 cluste4$3) 
near5 quer$3))) and ( ((((analyz$3 pars$3) near5 quer$3 ) 
and ((histor$3 log$3) nearS quer$3)) and ((classif$4 
cluste4$3) near5 quer$3)) and (6502091. PN. and 
(6502091. PN. and (6502091. pn. and (user$4 same 
context$4))) and ((quer$3 near3 context$1 ) same (context$1 
near3 vector$1)))) and ( (6502091. PN. and (6502091. PN. and 
(6502091. pn. and (user$4 same context$4))) and ((quer$3 
near3 context$1 ) same (context$1 near3 vector$1))) and 
((((analyz$3 pars$3) nearS quer$3 ) and ((histor$3 log$3) 
nearS quer$3)) and ((classif$4 cluste4$3) near5 quer$3)) and 
(((user$3 near5 interact$4 near5 (stat$3 data information)) 
(user$4 near5 (histor$4 log$4))) same (user$4 near5 
context$4 nearS (vector$4 classif$4 cluster$4))) ) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


2004/07/28 14:56 
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context$1 near4 attribut$3 near4 database$1 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/25 10:17 
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context$3 near2 vector$3 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 08:09 
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5303361 .pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/2513:47 
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USPAT; 
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EPO; JPO; 
DERWENT: 
IBM TDB 
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55241 87.pn. 
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EPO; JPO; 
DERWENT; 
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USPAT; 
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EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 10:53 
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5600835.pn. 


USPAT; 


2003/02/11 10:54 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 
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5608899.pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 10:54 


- 
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5619709.pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 10:54 
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2 


5710899.pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 10:55 
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2 


5754939.pn. 


USPAT; 


2003/02/11 10:55 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 
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5768578.pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 10:56 
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5930501. pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 11:02 
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5826260.pn. 


USPAT; 


2003/02/11 11:03 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 
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2 


5446891. pn. 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/02/11 11:03 


- 


26 


quer$3 near7 (context$3 near2 vector$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:32 


- 


3 


quer$3 near4 context$1 near4 classif$6 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:34 


- 


715 


quer$3 near4 context$1 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:39 
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quer$3 near3 context$1 
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DERWENT; 
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2003/07/16 10:35 
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20 


(quer$3 near3 context$1 ) same (context$1 near3 vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/07/28 14:54 


- 


27 


(quer$3 near3 context$1 ) and (context$1 near3 vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:55 
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16 


quer$3 near3 context$1 near3 vector$1 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:42 
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165 


quer$3 near3 classif$6 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:42 
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(quer$3 near3 classif$6) and (context$1 near3 vector$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/07/28 14:54 


- 


342 


intelligen$4 near3 quer$3 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:44 


- 


9 


(intelligen$4 nearS quer$3) and (context$1 near3 vector$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:45 
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quer$3 near4 (context$3 near2 vector$3) 


USPAT; 
US-PGPUB: 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:46 
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502 


quer$4 near4 (classif$5 cluster$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 10:55 
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14 


(quer$4 near4 (classif$5 cluster$3)) and (context$1 near3 
vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/07/28 14:54 


- 


53 


(quer$4 near4 (c!assif$5 cluster$3)) same vector$1 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 11:04 
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context$1 near3 attribut$3 near3 database$1 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/16 15:45 




0 


6327590.pn and (context$1 near5 vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:22 
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6327590.pn and (context$1 nearlO vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:21 


- 


0 


6327590.pn and (context$1 same vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:21 


- 


0 


6327590.pn and (context$1 and vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:22 


- 


1 


6327590.pn, and (context$1 nearS vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:22 


- 


1 


6327590.pn. and (context$1 nearlO vector$1) 


USPAT: 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 09:22 


- 


1 


6327590.pn. and (context$1 same vector$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 10:34 


- 


293 


user near3 quer$3 near4 record$1 


USPAT; 
US-PGPUB; 
EPO; JPO: 
DERWENT; 
IBM TDB 


2003/07/25 10:35 


- 


36 


( user near3 quer$3 near4 record$1) and ((analy2$4 pars$3) 
near4 record$1) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/2510:36 


- 


37 


( user nearS quer$3 near4 record$1) and ((analy2$4 pars$3) 
near4 record$3) 


USPAT; 
US-PGPUB: 
EPO; JPO: 
DERWENT; 
IBM TDB 


2003/07/25 10:41 


- 
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(analyz$4 pars$3) nearS quer$3 


USPAT; 
US-PGPUB; 
EPO: JPO; 
DERWENT; 
IBM TDB 


2003/07/2510:59 


- 


385 


((analyz$4 pars$3) nearS quer$3 ) and ((histor$3 log$3) near5 
quer$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/2510:48 


- 


18 


(((analyz$4 pars$3) nearS quer$3 ) and {(histor$3 log$3) 
nearS quer$3)) and ((classif$4 cluste4$3) nearS quer$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 

UtKWblN 1 , 
IBM TDB 


2004/07/28 14:55 
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record$3 near3 quer$3 
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record$3 near2 quer$3 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/2510:48 


- 


266 


(record$3 near2 quer$3 ) and ((analyz$4 pars$3) nearS 
quer$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 10:49 


- 


109 


((record$3 near2 quer$3 ) and ((analyz$4 pars$3) nearS 
quer$3)) and (classif$4 cluster$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 10:54 


- 


42 


quer$3 nearS log$1 near3 file$1 


USPAT; 


2003/07/25 10:59 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 




- 


1247 


usag$3 near3 log$3 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2003/07/25 10:59 


- 


11 


(usag$3 near3 log$3) and ((ana!yz$4 pars$3) nearS quer$3) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT: 
IBM TDB 


2003/07/25 11:00 


- 


47991 


(user$4 nearS interact$4 near5 (stat$3 data information)) 
(user$4 near5 (histor$4 log$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 15:56 


- 


12 


((user$4 nearS interact$4 nearS (stat$3 data information)) 
(user$4 nearS (histor$4 log$4))) same (user$4 nearS 
context$4 near5 (vector$4 classif$4 cluster$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/07/28 14:55 


- 


789 


(user$4 nearS quer$4 nearS (histor$4 log$4)) 


USPAT; 


2004/02/24 16:01 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 




- 


3 


( (user$4 nearS quer$4 nearS (histor$4 log$4))) same 
(user$4 near5 context$4 nearS (vector$4 classif$4 cluster$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 15:58 




3731 


( quer$4 nearS (histor$4 log$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 15:58 


- 


5 


( ( quer$4 nearS (histor$4 log$4))) same (user$4 near5 


USPAT; 


2004/02/24 16:00 






context$4 near5 (vector$4 classlf$4 cluster$4)) 


US-PGPUB; 








EPO; JPO; 
DERWENT; 
IBM TDB 






15 


( ( quer$4 near5 (histor$4 log$4))) and (user$4 nearS 
context$4 near5 (vector$4 classif$4 cluster$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:09 
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( ( quer$4 nearS (histor$4 log$4))) same ( context$4 nearS 
(vector$4 classif$4 cluster$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:02 


- 


40987 


(user$4 nearS (histor$4 log$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:01 


- 


12 


( (user$4 nearS (histor$4 log$4))) same ( context$4 near5 
(vector$4 classif$4 cluster$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:02 


- 


3 


"09/778146'* 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:09 


- 


1 


6456978. pn. AND (receiv$4 same quer$4) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:15 


- 


1 


6456978. pn. AND (context$4 samer vector$4) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:16 


- 


1 


(6456978.pn. AND (receiv$4 same quer$4)) and 
(6456978. pn. AND (context$4 samer vector$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:16 


- 


1 


6456978.pn. AND (context$4 same vector$4) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:17 


- 


1 


(6456978. pn. AND (context$4 same vector$4)) and 
(6456978. pn. AND (context$4 samer vector$4)) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:17 


- 


1 


6456978.pn. AND (user$4 nearlO context$4 ) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:17 


- 


1 


((6456978.pn. AND (context$4 same vector$4)) and 
(6456978. pn. AND (context$4 samer vector$4))) and 
(6456978. pn. AND (user$4 nearlO context$4 )) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/02/24 16:18 


- 


0 


6456978.pn. AND (context$4 near5 attribut$4 ) 


USPAT; 


2004/02/24 16:18 






US-PGPUB; 
EPO; JPO; 
UbKWtNT, 
IBM TDB 






1 


6456978.pn. AND (context$4 same attribut$4 ) 


USPAT; 


2004/02/25 09:59 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 
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1 


(((6456978. pn. AND (context$4 same vector$4)) and 
(6456978. pn. AND (context$4 samer vector$4))) and 
(6456978. pn. AND (user$4 nearlO context$4 ))) and 


USPAT; 
US-PGPUB; 
EPO; JPO; 


2004/02/24 16:19 






(6456978.pn. AND (context$4 same attribut$4 )) 


DERWENT; 








IBM TDB 




- 


1 


6456978. pn. AND (context$4 same attribut$4 ) 


USPAT; 


2004/02/25 10:01 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 




- 


142 


user$3 near5 quer$4 near5 histor$4 


USPAT; 


2004/02/25 10:18 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 




- 


593 


user$3 near5 quer$4 near5 analy$4 


USPAT; 


2004/02/25 10:18 






US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 




- 


24 
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1 Technique for automatically correcting words in text 
Karen Kukich 

^ ACM Computing Surveys (CSUR) Decennber 1992 
Volume 24 Issue 4 

Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context-dependent 
work correction. In response to the first problem, efficient pattern-matching and n-gram analysis 
techniques have been developed for detecting strings that do not appear In a given word list. In 
response to the second problem, a variety of general and application-specific spelling cor ... 



2 Learning classifiers: Liveclasslfier: creating hierarchical text classifiers through web so 
U corpora 

Chien-Chung Huang , Shui-Lung Chuang , Lee-Feng Chien 

Proceedings of the 13th conference on World Wide Web May 2004 

Many Web information services utilize techniques of information extraction(IE) to collect important 
facts from the Web. To create more advanced services, one possible method is to discover thematic 
information from the collected facts through text classification. However, most conventional text 
classification techniques rely on manual-labelled corpora and are thus ill-suited to cooperate with 
Web information services with open domains. In this work, we present a system named LiveClassifier 
that ... 



3 Ontological user profiling in recommender systems so 

Stuart E. Middleton , Nigel R. Shadbolt , David C. De Roure 

ACM Transactions on Information Systems (TOIS) January 2004 

Volume 22 Issue 1 

We explore a novel ontological approach to user profiling within recommender systems, working on 
the problem of recommending on-line academic research papers. Our two experimental systems, 
Quickstep and Foxtrot, create user profiles from unobtrusively monitored behaviour and relevance 
feedback, representing the profiles in terms of a research paper topic ontology. A novel profile 
visualization approach is taken to acquire profile feedback. Research papers are classified using 
ontological classes ... 



4 A model of multimedia information retrieval 

Carlo Meghini , Fabrizio Sebastian! , Umberto Straccia 
^ Journal of the ACM (JACM) September 2001 
Volume 48 Issue 5 

Research on multimedia Information retrieval (MIR) has recently witnessed a booming Interest. A 
prominent feature of this research trend is Its simultaneous but independent materialization within 
several fields of computer science. The resulting richness of paradigms, methods and systems may, 
on the long run, result In a fragmentation of efforts and slow down progress. The primary goal of this 
study is to promote an integration of methods and techniques for MIR by contributing a conceptual 
model ... 



5 Data clustering: a review 

A. K. Jain , M. N. Murty , P. J. Flynn 
ACM Computing Surveys (CSUR) September 1999 
Volume 31 Issue 3 

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) 
into groups (clusters). The clustering problem has been addressed in many contexts and by 
researchers In many disciplines; this reflects Its broad appeal and usefulness as one of the steps in 
exploratory data analysis. However, clustering is a difficult problem comblnatorially, and differences 
in assumptions and contexts in different communities has made the transfer of useful generic co ... 



6 Knowledge and representation: Leveraging a common representation for 
12 personalized search and summarization in a medical digital library 

Kathleen R. McKeown , Noemie Elhadad , Vasileios Hatzivassiloglou 

Proceedings of the third ACM/IEEE-CS joint conference on Digital libraries May 2003 

Despite the large amount of online medical literature, it can be difficult for clinicians to find relevant 
information at the point of patient care. In this paper, we present techniques to personalize the 
results of search, making use of the online patient record as a sophisticated, pre-existing user model. 
Our work in PERSIVAL, a medical digital library, includes methods for re-ranking the results of search 
to prioritize those that better match the patient record. It also generates summa ... 

7 Supporting cooperative and personal surfing with a desktop assistant 

Hannes Marais , Krishna Bharat 

Proceedings of the 10th annual ACM symposium on User interface software and technology 

October 1997 



8 A multilevel approach to intelligent information filtering: model, system, and so 
1^ evaluation 

J. Mostafa , S. Mukhopadhyay , M. Palakal , W. Lam 

ACM Transactions on Information Systems (TOIS) October 1997 

Volume 15 Issue 4 

In information-filtering environments, uncertainties associated with changing interests of the user 
and the dynamic document stream must be handled efficiently. In this article, a filtering model is 
proposed that decomposes the overall task into subsystem functionalities and highlights the need for 
multiple adaptation techniques to cope with uncertainties. A filtering system, SIFTER, has been 
implemented based on the model, using established techniques in information retrieval and 
artificia ... 

9 System section: Computer vision tecliniques for PDA accessibility of in-house video 77 
surveillance 

Rita Cucchiara , Costantino Grana , Andrea Prati , Roberto Vezzani 

First ACM SIGi^l^ international workshop on Video surveillance November 2003 

In this paper we propose an approach to indoor environment surveillance and, in particular, to people 
behaviour control in home automation context. The reference application is a silent and automatic 



control of the behaviour of people living alone in the house and specially conceived for people with 
limited autonomy (e.g., elders or disabled people). The aim is to detect dangerous events (such as a 
person falling down) and to react to these events by establishing a remote connection with low-par ... 

10 Semantic annotation and integration: Towards the self-annotating web 77 

Cft Philipp Cimiano , Siegfried Handschuh , Steffen Staab 

^ Proceedings of the 13th conference on World Wide Web May 2004 

The success of the Semantic Web depends on the availability of ontologies as well as on the 
proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial 
question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based 
Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based 
approach to categorize instances with regard to an ontology. The approach is evaluated against the 
manual annotations ... 

11 Message classification in the call center 

Stephan Busemann , Sven Schmeier , Roman G. Arens 
^ Proceedings of the sixth conference on Applied natural language processing April 2000 

Customer care in technical domains is increasingly based on e-mail communication, allowing for the 
reproduction of approved solutions. Identifying the customer's problem is often time-consuming, as 
the problem space changes if new products are launched. This paper describes a new approach to the 
classification of e-mail requests based on shallow text processing and machine learning techniques. It 
is implemented within an assistance system for call center agents that is used in a commercial 
setti ... 

12 Special issue on word sense disambiguation: Introduction to the special issue on 77 
word sense disannbiguation: the state of the art 

Nancy Ide , Jean Veronis 
Computational Linguistics March 1998 
Volume 24 Issue 1 

13 Evaluating message understanding systems: an analysis of the third message 77 
U understanding conference (MUC-3) 

Nancy Chinchor , David D. Lewis , Lynette Hirschman 
Computational Linguistics September 1993 
Volume 19 Issue 3 

This paper describes and analyzes the results of the Third Message Understanding Conference (MUC- 
3). It reviews the purpose, history, and methodology of the conference, summarizes the participating 
systems, discusses issues of measuring system effectiveness, describes the linguistic phenomena 
tests, and provides a critical look at the evaluation in terms of the lessons learned. One of the 
common problems with evaluations is that the statistical significance of the results is unknown. In the 
disc ... 

14 Dialogue act modeling for automatic tagging and recognition of conversational 77 
speech 

Andreas Stolcke , Noah Coccaro , Rebecca Bates , Paul Taylor , Carol Van Ess-Dykema , Klaus Ries , 
Elizabeth Shriberg , Daniel Jurafsky , Rachel Martin , Marie Meteer 
Computational Linguistics September 2000 
Volume 26 Issue 3 

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech- 
act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DISAGREEMENT, and 
APOLOGY. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic 
cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is 
based on treating the discourse structure of a conversation as a hidden ... 



15 Data streams (DS): Discovering decision rules from numerical data streams 



77 



Francisco Ferrer-Troyano , Jesus S. Aguilar-Ruiz , Jose C. Riquelme 

Proceedings of the 2004 ACM symposium on Applied computing March 2004 

This paper presents a scalable learning algorithm to classify numerical, low dimensionality, high- 
cardinality, time-changing data streams. Our approach, named SCALLOP, provides a set of decision 
rules on demand which improves its simplicity and helpfulness for the user. SCALLOP updates the 
knowledge model every time a new example is read, adding interesting rules and removing out-of- 
date rules. As the model is dynamic, it maintains the tendency of data. Experimental results with 
synthetic data s ... 



16 Special issue on learning from imbalanced datasets: Mining with rarity: a unifying 77 
frannework 

Gary Weiss 

ACM SIGKDD Explorations Newsletter June 2004 
Volume 6 Issue 1 

Rare objects are often of great interest and great value. Until recently, however, rarity has not 
received much attention in the context of data mining. Now, as increasingly complex real-world 
problems are addressed, rarity, and the related problem of imbalanced data, are taking center stage. 
This article discusses the role that rare classes and rare cases play in data mining. The problems that 
can result from these two forms of rarity are described in detail, as are methods for addressing 
these ... 



17 Maximum likelihood estimation for filtering thresholds 

Cih Yi Zhang , Jamie Callan 

— Proceedings of the 24th annual international ACM SIGIR conference on Research and 
development in information retrieval September 2001 

Information filtering systems based on statistical retrieval models usually compute a numeric score 
indicating how well each document matches each profile. Documents with scores above profile- 
specificdissemination thresholdsare delivered. An optimal dissemination threshold is one that 
maximizes a given utility function based on the distributions of the scores of relevant and non- 
relevant documents. The parameters of the distribution can be estimated using releva ... 



18 Reports from related meetings: Interface '99: a data mining overview 

Arnold Goodman 

ACM SIGKDD Explorations Newsletter January 2000 
Volume 1 Issue 2 

This personal overview of Interface '99 is intended to communicate its meaning and relevance to 
SIGKDD, as well as provide valuable information on trends within the Interface for data miners 
seeking to learn more about statistics. In addition, it is the newest link in a bridge between the 
Interface and KDD begun by References 2-4 and the sessions on KDD at Interface '98 and Interface 
'99. 



19 Survey articles: Web usage mining: discovery and applications of usage patterns 
13 from Web data 

Jaideep Srivastava , Robert Cooley , Mukund Deshpande , Pang-Ning Tan 
ACM SIGKDD Explorations Newsletter January 2000 
Volume 1 Issue 2 

Web usage mining is the application of data mining techniques to discover usage patterns from Web 
data, in order to understand and better serve the needs of Web-based applications. Web usage 
mining consists of three phases, namely preprocessing, pattern discovery , and pattern analysis. This 
paper describes each of these phases in detail. Given its application potential, Web usage mining has 
seen a rapid increase in interest, from both the research and practice communities. This pap ... 



20 Visualization: Analysis of visualisation requirements for fuzzy systems 

Binh Pham , Ross Brown 

Proceedings of the 1st international conference on Computer graphics and interactive 
techniques in Austalasia and South East Asia February 2003 



This paper provides a compreiiensive analysis of the working and requirements of fuzzy systems with 
the view to devise appropriate visualisation framework and techniques for these systems using a 
user- and task-oriented approach. We firstly discuss the nature of fuzzy data and the essential 
components of typical fuzzy systems, then categorise different visualisation requirements from three 
perspectives: user of fuzzy systems, designer of fuzzy systems and designer of visualisation systems. 
The vi ... 
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21 Accepted Posters: Beyond broadcast 

Kevin Livingston , Mark Dredze , Krlstian Hammond , Larry Birnbaum 

Proceedings of the 8th international conference on Intelligent user interfaces January 2003 
The work presented in this paper takes a novel approach to the task of providing information to 
viewers of broadcast news. Instead of considering the broadcast news as the end product, this work 
uses it as a starting point to dynamically build an information space for the user to explore. This 
information space is designed to satisfy the users information needs, by containing more breadth, 
depth, and points of view than the original broadcast story. The architecture and current 
implementation ar ... 



22 Papers: collaborating through documents: Augmenting shared personal calendars 77 

Joe Tullio , Jeremy Goecks , Elizabeth D. Mynatt , David H. Nguyen 

Proceedings of the 15th annual ACM symposium on User interface software and technology 

October 2002 

In this paper, we describe Augur, a groupware calendar system to support personal calendaring 
practices, informal workplace communication, and the socio-technical evolution of the calendar 
system within a workgroup. Successful design and deployment of groupware calendar systems have 
been shown to depend on several converging, interacting perspectives. We describe calendar-based 
work practices as viewed from these perspectives, and present the Augur system in support of them. 
Augur allows users t ... 



23 Poster session: Automated learning of model classifications 

Cheuk Yiu Ip , William C. Regli , Leonard Sieger , AM Shokoufandeh 

Proceedings of the eighth ACM symposium on Solid modeling and applications June 2003 
This paper describes a new approach to automate the classification of solid models using machine 
learning techniques. Existing approaches, based on group technology, fixed matching algorithms or 
pre-defined feature sets, impose a priori categorization schemes on engineering data or require 
significant human labeling of design data. This paper describes a shape learning algorithm and a 
general technique for "teaching" the algorithm to identify new or hidden classifications that are 
relevant in ma ... 



24 Industry track papers: From run-time behavior to usage scenarios: an interaction- 
pattern mining approach 

Mohammad El-Ramly , Eleni Stroulia , Paul Sorenson 

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and 
data mining July 2002 

A key challenge facing IT organizations today is their evolution towards adopting e-business practices 
that gives rise to the need for reengineering their underlying software systems. Any reengineering 
effort has to be aware of the functional requirements of the subject system, in order not to violate 
the integrity of its intended uses. However, as software systems get regularly maintained throughout 
their lifecycle, the documentation of their requirements often become obsolete or get lost. To a ... 



25 Machine learning in automated text categorization 

Fabrizio Sebastian! 
^ ACM Computing Surveys (CSUR) March 2002 
Volume 34 Issue 1 

The automated categorization (or classification) of texts into predefined categories has witnessed a 
booming interest in the last 10 years, due to the increased availability of documents in digital form 
and the ensuing need to organize them. In the research community the dominant approach to this 
problem is based on machine learning techniques: a general inductive process automatically builds a 
classifier by learning, from a set of preclassified documents, the characteristics of the categories. ... 



26 Scaling question answering to the web 

Cody Kwok , Oren Etzioni , Daniel S. Weld 
^ ACM Transactions on Information Systems (TOIS) July 2001 
Volume 19 Issue 3 

The wealth of information on the web makes it an attractive resource for seeking quick answers to 
simple, factual questions such as "e;who was the first American in space?"e; or "e;what is the second 
tallest mountain in the world?"e; Yet today's most advanced web search services (e.g., Google and 
AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend 
question-answering techniques, first studied in the information retrieval literature ... 



27 Video Retrieval and Browsing: Connparing discriminating transfornnations and SVI^ 
12 for learning during nnultinnedia retrieval 

Xiang Sean Zhou , Thomas S. Huang 

Proceedings of the ninth ACM international conference on Multimedia October 2001 

On-line learning or "relevance feedback" techniques for multimedia information retrieval have been ' 
explored from many different points of view: from early heuristic-based feature weighting schemes to 
recently proposed optimal learning algorithms, probabilistic/Bayesian learning algorithms, boosting 
techniques, discriminant-EM algorithm, support vector machine, and other kernel-based learning 
machines. Based on a careful examination of the problem and a detailed analysis of the existing 
solutions ... 



28 Scaling question answering to the Web 

Cody C. T. Kwok , Oren Etzioni , Daniel S. Weld 
— Proceedings of the tenth international conference on World Wide Web April 2001 



29 Temporal sequence learning and data reduction for anomaly detection 

Terran Lane , Caria E. Brodley 

ACM Transactions on Information and System Security (TISSEC) August 1999 
Volume 2 Issue 3 

The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of 
an individual, system, or network in terms of temporal sequences of discrete data. We present an 
approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection 
task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, 
unordered observations into a metric space via a similarity measure that encodes intra-attribute 
depende ... 



30 User interactions with everyday applications as context for just-in-tinne information 77 
access 

Jay Budzik , Kristian J. Hammond 

Proceedings of the 5th international conference on Intelligent user interfaces January 2000 
Our central claim Is that user interactions with everyday productivity applications (e.g., word 
processors, Web browsers, etc.) provide rich contextual Information that can be leveraged to support 
just-ln-time access to task-relevant Information. We discuss the requirements for such systems, and 
develop a general architecture for systems of this type. As evidence for our claim, we present 
Watson, a system which gathers contextual Information in the form of the text of the document the 
user ... 

31 The FINITE STRING Newsletter: Abstracts of current literature 77 
Computational Linguistics Staff 

— Computational Linguistics January 1987 
Volume 13 Issue 1-2 

32 The FINITE STRING newsletter: Abstracts of current literature 77 

Computational Linguistics Staff 
— * Computational Linguistics April 1986 
Volume 12 Issue 2 

33 Challenges In information retrieval and language nnodeling: report of a workshop 77 

12 held at the center for intelligent infornriation retrieval, University of Massachusetts 
Annherst, Septennber 2002 

James Allan , Jay Aslam , Nicholas Belkin , Chris Buckley , Jamie Callan , Bruce Croft , Sue Dumais , 
Norbert Fuhr , Donna Harman , David J. Harper , Djoerd Hiemstra , Thomas Hofmann , Eduard Hovy , 
Wessel Kraalj , John Lafferty , Victor Lavrenko , David Lewis , Liz Liddy , R. Manmatha , Andrew 
McCallum , Jay Ponte , John Prager , Dragomir Radev , Philip Resnik , Stephen Robertson , Ron! 
Rosenfeld , Sallm Roukos , Mark Sanderson , Rich Schwartz , Amit SInghal , Alan Smeaton , Howard 
Turtle , Ellen Voorhees , Ralph Weischedel , Jinxi Xu , ChengXiang Zhai 
ACI^ SIGIR Forum April 2003 
Volume 37 Issue 1 

34 Evolving data mining into solutions for Insights: Scaling mining algorithms to large 77 
databases 

Paul Bradley , Johannes Gehrke , Raghu Ramakrishnan , Ramakrishnan Srikant 
Communications of the ACM August 2002 
Volume 45 Issue 8 

Which insights about data structure make It possible to analyze the very large databases collected by 
Internet, business, scientific, and government applications? 

35 Description and Analysis: ChangeDetector""^: a site-level monitoring tool for the 77 

13 www 

Vijay Boyapati , Kristie Chevrier , Avi Finkel , Natalie Glance , Tom Pierce , Robert Stockton , Chip 
Whitmer 

Proceedings of tlie eleventh international conference on World Wide Web May 2002 

This paper presents a new challenge for Web monitoring tools: to build a system that can monitor 
entire web sites effectively. Such a system could potentially be used to discover "silent news" hidden 
within corporate web sites. Examples of silent news include reorganizations in the executive team of 
a company or In the retirement of a product line. ChangeDetector, an Implemented prototype, 
addresses this challenge by incorporating a number of machine learning techniques. The principal 
backend co ... 



36 The proposed new Computing Reviews classification scheme 

Anthony Ralston 

Communications of the ACM July 1981 
Volume 24 Issue 7 



37 The new (1982) Computing Reviews classification system— final version 

Jean E. Sammet , Anthony Ralston 
^ Communications of the ACi^ January 1982 
Volume 25 Issue 1 



38 A learning agent for wireless news access 

Daniel Blllsus , Michael J. Pazzani , James Chen 

Proceedings of the 5th international conference on Intelligent user interfaces January 2000 
We describe a user interface for wireless information devices, specifically designed to facilitate 
learning about users' individual Interests in daily news stories. User feedback is collected 
unobtrusively to form the basis for a content-based machine learning algorithm. As a result, the 
described system can adapt to users' individual interests, reduce the amount of information that 
needs to be transmitted, and help users access relevant information with minimal effort. 



39 Programming by demonstration: an inductive learning formulation 

Tessa A. Lau , Daniel S. Weld 

Proceedings of the 4th international conference on Intelligent user interfaces December 1998 



40 Interactive two-handed gesture interface in 3D virtual environments 

aHiroaki Nishino , Kouichi Utsumiya , Daisuke Kuraoka , Kenji Yoshioka , KazuyoshI Korida 
Proceedings of the ACM symposium on Virtual reality software and technology September 1997 
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41 Detection of shifts in user interests for personalized infornnation filtering 

W. Lam , S. Mukhopadhyay , J. Mostafa , M. Palakal 

Proceedings of the 19th annual international ACM SIGIR conference on Research and 
development in information retrieval August 1996 



42 Mining scientific data 

Usanna Fayyad , David Haussler , Paul Stolorz 
Communications of the ACM November 1996 
Volume 39 Issue 11 



43 A multiparadignnatic environment for interacting with databases 

T. Catarci , M. F. Costabile , A. Massari , L. Saladini , G. Santucci 
^ ACM SIGCHI Bulletin ]uly 1996 
Volume 28 Issue 3 

We present a prototype system to be used for visually accessing heterogeneous databases. The basic 
idea is to provide the user with several visual representations of data as well as multiple interaction 
mechanisms for both querying databases and visualizing the query results. Since some visual 
representations better fit certain user classes, the system adapts to the user's needs by switching to 
the most appropriate visual representation and interaction mechanism, according to a suitable user 
mod ... 



44 Pen computing: a technology overview and a vision 

Andre Meyer 
^ ACM SIGCHI Bulletin July 1995 
Volume 27 Issue 3 

This work gives an overview of a new technology that is attracting growing interest in public as well 
as in the computer industry itself. The visible difference from other technologies is in the use of a pen 
or pencil as the primary means of interaction between a user and a machine, picking up the familiar 
pen and paper interface metaphor. From this follows a set of consequences that will be analyzed and 
put into context with other emerging technologies and visions. Starting with a short historic ... 



45 Automated cataloging and analysis of sky survey image databases: the SKICAT 
system 

Usama M. Fayyad , Nicholas Weir , S. Djorgovski 

Proceedings of the second international conference on Information and knowledge 
management December 1993 



Results 41 - 45 of 45 short listing 

<? ^ 

Pmt Next 

X 2 3 ^^"^ 



The ACM Portal is published by the Association for Computing Machinery. Copyright 72004 ACM, Inc. 



PCIRTAL 



> home *. > about : > feedback : > log 
US Patent & Trademark Office 



Search Results 




Try the new Portal design 



Give us your opinion after using it. 



Search Results for: [ query <and> context <and> database <and> classifier <and> context 
<and> vector <and> interaction <and> history ] 
Found 40 of 139,988 searciied. 

Search within Results 



> Advanced Search > Search Help/Tips 



Sort by: Title Publication Publication Date Score ♦ Binder 



Results 1 - 20 of 40 short listing 

pj^m 12 3 



1 Technique for autonnatically correcting words In text 

Karen Kukich 

ACM Computing Surveys (CSUR) Decennber 1992 
Volunne 24 Issue 4 

Research aimed at correcting words in text has focused on three progressively nnore difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context-dependent 
work correction. In response to the first problenn, efficient pattern-matching and n-gram analysis 
techniques have been developed for detecting strings that do not appear in a given word list. In 
response to the second problem, a variety of general and application-specific spelling cor ... 



2 Learning classifiers: Liveclassifier: creating hierarchical text classifiers through web 80 
U corpora 

Chien-Chung Huang , Shui-Lung Chuang , Lee-Feng Chien 

Proceedings of the 13th conference on World Wide Web May 2004 

l^any Web information services utilize techniques of information extraction(IE) to collect important 
facts from the Web. To create more advanced services, one possible method is to discover thematic 
information from the collected facts through text classification. However, most conventional text 
classification techniques rely on manual-labelled corpora and are thus Ill-suited to cooperate with 
Web information services with open domains. In this work, we present a system named LiveClassifier 
that ... 



3 Computational models: Biologically inspired rule-based multiset programming 
1^ paradigm for soft-computing 

E. V. Krishnamurthy , V. K. Murthy , Vikram Krishnamurthy 

Proceedings of the first conference on computing frontiers on Computing frontiers April 2004 
This paper describes a rule-based multiset programming paradigm, as a unifying theme for biological, 
chemical, DNA, physical and molecular computations. The computations are interpreted as the 
outcome arising out of deterministic, nondeterministic or stochastic interaction among elements in a 
multiset object space which includes the environment. These interactions are like chemical reactions 
and the evolution of the multiset can mimic the biological evolution. Since the reaction rules are 
inhere ... 



4 Ontological user profiling in recomnnender systems 

Stuart E. Middleton , Nigel R. Shadbolt , David C. De Roure 
ACM Transactions on Information Systems (TOIS) January 2004 
Volume 22 Issue 1 

We explore a novel ontological approach to user profiling within recommender systenns, working on 
the problem of recommending on-line academic research papers. Our two experimental systems. 
Quickstep and Foxtrot, create user profiles from unobtrusively monitored behaviour and relevance 
feedback, representing the profiles in terms of a research paper topic ontology. A novel profile 
visualization approach is taken to acquire profile feedback. Research papers are classified using 
ontological classes ... 



5 A model of multimedia information retrieval 

Carlo Meghini , Fabrizio Sebastiani , Umberto Straccia 
Journal of the ACM (JACM) September 2001 
Volume 48 Issue 5 

Research on multimedia Information retrieval (MIR) has recently witnessed a booming interest. A 
prominent feature of this research trend Is its simultaneous but independent materialization within 
several fields of computer science. The resulting richness of paradigms, methods and systems may, 
on the long run, result in a fragmentation of efforts and slow down progress. The primary goal of this 
study is to promote an Integration of methods and techniques for MIR by contributing a conceptual 
model ... 



6 Data clustering: a review 

A. K. Jain , M. N. Murty , P. J. Flynn 
ACM Computing Surveys (CSUR) September 1999 
Volume 31 Issue 3 

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) 
into groups (clusters). The clustering problem has been addressed in many contexts and by 
researchers In many disciplines; this reflects its broad appeal and usefulness as one of the steps In 
exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences 
In assumptions and contexts in different communities has made the transfer of useful generic co ... 

7 Knowledge and representation: Leveraging a connmon representation for 
12 personalized search and summarization in a medical digital library 

Kathleen R. McKeown , Noemie Elhadad , Vaslleios Hatzivassiloglou 

Proceedings of the third ACM/IEEE-CS joint conference on Digital libraries May 2003 

Despite the large amount of online medical literature. It can be difficult for clinicians to find relevant 
information at the point of patient care. In this paper, we present techniques to personalize the 
results of search, making use of the online patient record as a sophisticated, pre-existing user model. 
Our work in PERSIVAL, a medical digital library, includes methods for re-ranking the results of search 
to prioritize those that better match the patient record. It also generates summa ... 



8 Efficient algorithms for geometric optimization 

Pankaj K. Agarwal , MIcha Sharir 
^ ACM Computing Surveys (CSUR) December 1998 
Volume 30 Issue 4 

We review the recent progress in the design of efficient algorithms for various problems in geometric 
optimization. We present several techniques used to attack these problems, such as parametric 
searching, geometric alternatives to parametric searching, prune-and-search techniques for linear 
programming and related problems, and LP-type problems and their efficient solution. We then 
describe a wide range of applications of these and other techniques to numerous problems in 
geometric optim ... 



9 



Supporting cooperative and personal surfing with a desktop assistant 



Hannes Marais , Krishna Bharat 

Proceedings of the 10th annual ACM symposium on User interface software and technology 

October 1997 



10 A multilevel approach to intelligent infornnation filtering: model, system, and 
evaluation 

J. Mostafa , S. Mukhopadhyay , M. Palakal , W. Lam 

ACM Transactions on Information Systems (TOIS) October 1997 

Volume 15 Issue 4 

In information-filtering environments, uncertainties associated with changing interests of the user 
and the dynamic document stream must be handled efficiently. In this article, a filtering model is 
proposed that decomposes the overall task into subsystem functionalities and highlights the need for 
multiple adaptation techniques to cope with uncertainties. A filtering system, SIFTER, has been 
implemented based on the model, using established techniques in Information retrieval and 
artificia ... 



11 Semantic annotation and integration: Towards the self-annotating web 

Philipp Cimiano , Siegfried Handschuh , Steffen Staab 
Proceedings of the 13th conference on World Wide Web May 2004 

The success of the Semantic Web depends on the availability of ontologies as well as on the 
proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial 
question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based 
Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based 
approach to categorize instances with regard to an ontology. The approach is evaluated against the 
manual annotations ... 



12 Message classification in the call center 

Stephan Busemann , Sven Schmeier , Roman G. Arens 
— Proceedings of the sixth conference on Applied natural language processing April 2000 

Customer care in technical domains is increasingly based on e-mail communication, allowing for the 
reproduction of approved solutions. Identifying the customer's problem is often time-consuming, as 
the problem space changes if new products are launched. This paper describes a new approach to the 
classification of e-mail requests based on shallow text processing and machine learning techniques. It 
is implemented within an assistance system for call center agents that is used in a commercial 
setti ... 



13 Special issue on word sense disambiguation: Introduction to the special issue on 
[2 word sense disambiguation: the state of the art 

Nancy Ide , Jean Veronis 
Computational Linguistics March 1998 
Volume 24 Issue 1 



14 Evaluating message understanding systems: an analysis of the third message 
U understanding conference (MUC-3) 

Nancy Chinchor , David D. Lewis , Lynette Hirschman 
Computational Linguistics September 1993 
Volume 19 Issue 3 

This paper describes and analyzes the results of the Third Message Understanding Conference (MUC- 
3). It reviews the purpose, history, and methodology of the conference, summarizes the participating 
systems, discusses issues of measuring system effectiveness, describes the linguistic phenomena 
tests, and provides a critical look at the evaluation in terms of the lessons learned. One of the 
common problems with evaluations is that the statistical significance of the results is unknown. In the 
disc ... 



15 Dialogue act modeling for automatic tagging and recognition of conversational 



12 speech 

Andreas Stolcke , Noah Coccaro , Rebecca Bates , Paul Taylor , Carol Van Ess-Dykenna , Klaus RIes , 
Elizabeth Shriberg , Daniel Jurafsky , Rachel Martin , Marie Meteer 
Computational Linguistics September 2000 
Volume 26 Issue 3 

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech- 
act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DISAGREEMENT, and 
APOLOGY. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic 
cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is 
based on treating the discourse structure of a conversation as a hidden ... 

16 Data streams (DS): Discovering decision rules from numerical data streams 

Francisco Ferrer-Troyano , Jesus S. Aguilar-Ruiz , Jose C. Riquelme 
Proceedings of the 2004 ACM symposium on Applied computing March 2004 

This paper presents a scalable learning algorithm to classify numerical, low dimensionality, high- 
cardinality, time-changing data streams. Our approach, named SCALLOP, provides a set of decision 
rules on demand which improves its simplicity and helpfulness for the user. SCALLOP updates the 
knowledge model every time a new example is read, adding Interesting rules and removing out-of- 
date rules. As the model is dynamic, it maintains the tendency of data. Experimental results with 
synthetic data s ... 

17 jviaximum likelihood estimation for filtering thresholds 

Yi Zhang , Jamie Callan 

— Proceedings of the 24th annual international ACM SIGIR conference on Research and 
development in information retrieval September 2001 

Information filtering systems based on statistical retrieval models usually compute a numeric score 
indicating how well each document matches each profile. Documents with scores above profile- 
speciflcdlssemlnation thresholdsare delivered. An optimal dissemination threshold is one that 
maximizes a given utility function based on the distributions of the scores of relevant and non- 
relevant documents. The parameters of the distribution can be estimated using releva ... 



18 Phase tracking and prediction 

Timothy Sherwood , Suleyman Sair , Brad Calder 

ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual international 
symposium on Computer architecture May 2003 
Volume 31 Issue 2 

In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye 
view of the behavior of a program at these speeds can be a difficult task when all that is available is 
cycle by cycle examination. In many programs, behavior is anything but steady state, and 
understanding the patterns of behavior, at run-time, can unlock a multitude of optimization 
opportunities. In this paper, we present a unified profiling architecture that can efficiently capture, 
classify, and ... 

19 Survey articles: Web usage mining: discovery and applications of usage patterns 
12 from Web data 

Jaideep Srivastava , Robert Cooley , Mukund Deshpande , Pang-Ning Tan 
ACM SIGKDD Explorations Newsletter January 2000 
Volume 1 Issue 2 

Web usage mining is the application of data mining techniques to discover usage patterns from Web 
data, in order to understand and better serve the needs of Web-based applications. Web usage 
mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This 
paper describes each of these phases in detail. Given its application potential, Web usage mining has 
seen a rapid increase in interest, from both the research and practice communities. This pap ... 



20 Visualization: Analysis of visualisation requirements for fuzzy systems 

Binh Pham , Ross Brown 



Proceedings of the 1st international conference on Computer graphics and interactive 

techniques in Austaiasia and South East Asia February 2003 

This paper provides a comprehensive analysis of the working and requirements of fuzzy systems with 
the view to devise appropriate visualisation framework and techniques for these systems using a 
user- and task-oriented approach. We firstly discuss the nature of fuzzy data and the essential 
components of typical fuzzy systems, then categorise different visualisation requirements from three 
perspectives: user of fuzzy systems, designer of fuzzy systems and designer of visualisation systems. 
The vl ... 
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21 Accepted Posters: Beyond broadcast 

Kevin Livingston , Mark Dredze , Kristian Hammond , Larry Birnbaum 

Proceedings of the 8th international conference on Intelligent user interfaces January 2003 
The work presented in this paper takes a novel approach to the task of providing information to 
viewers of broadcast news. Instead of considering the broadcast news as the end product, this work 
uses it as a starting point to dynamically build an information space for the user to explore. This 
information space is designed to satisfy the users Information needs, by containing more breadth, 
depth, and points of view than the original broadcast story. The architecture and current 
implementation ar ... 



22 Poster session: Automated learning of model classifications 

Cheuk Yiu Ip , William C. Regli , Leonard Sieger , Ali Shokoufandeh 
^ Proceedings of the eighth ACM symposium on Solid modeling and applications June 2003 

This paper describes a new approach to automate the classification of solid models using machine 
learning techniques. Existing approaches, based on group technology, fixed matching algorithms or 
pre-defined feature sets, impose a priori categorization schemes on engineering data or require 
significant human labeling of design data. This paper describes a shape learning algorithm and a 
general technique for "teaching" the algorithm to identify new or hidden classifications that are 
relevant in ma ... 



23 Machine learning in automated text categorization 

Fabrizio Sebastian! 
^ ACM Computing Surveys (CSUR) March 2002 
Volume 34 Issue 1 

The automated categorization (or classification) of texts into predefined categories has witnessed a 
booming interest in the last 10 years, due to the increased availability of documents In digital form 
and the ensuing need to organize them. In the research community the dominant approach to this 
problem is based on machine learning techniques: a general Inductive process automatically builds a 
classifier by learning, from a set of preclassified documents, the characteristics of the categories. ... 



24 Scaling question answering to the web 



77 



Cody Kwok , Oren Etzioni , Daniel S. Weld 

ACM Transactions on Information Systems (TOIS) July 2001 
Volume 19 Issue 3 

The wealth of information on the web makes it an attractive resource for seeking quick answers to 
simple, factual questions such as "e;who was the first American in space?"e; or "e;what is the second 
tallest mountain in the world?"e; Yet today's most advanced web search services (e.g., Google and 
AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend 
question-answering techniques, first studied in the information retrieval literature ... 



25 Video Retrieval and Browsing: Comparing discriminating transformations and SVM 
U for learning during multimedia retrieval 

Xiang Sean Zhou , Thomas S. Huang 

Proceedings of the ninth ACM international conference on Multimedia October 2001 

On-line learning or "relevance feedback" techniques for multimedia information retrieval have been 
explored from many different points of view: from early heuristic-based feature weighting schemes to 
recently proposed optimal learning algorithms, probabilistic/Bayesian learning algorithms, boosting 
techniques, discriminant-EM algorithm, support vector machine, and other kernel-based learning 
machines. Based on a careful examination of the problem and a detailed analysis of the existing 
solutions ... 



26 Scaling question answering to the Web 

Cody C. T. Kwok , Oren Etzioni , Daniel S. Weld 

Proceedings of the tenth international conference on World Wide Web April 2001 



27 Temporal sequence learning and data reduction for anomaly detection 

Terran Lane , Caria E. Brodley 

ACM Transactions on Information and System Security (TISSEC) August 1999 
Volunne 2 Issue 3 

The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of 
an individual, system, or network in terms of temporal sequences of discrete data. We present an 
approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection 
task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, 
unordered observations into a metric space via a similarity measure that encodes intra-attribute 
depende ... 



28 User interactions witli everyday applications as context for just-in-tinne information 
access 

Jay Budzik , Kristian J. Hammond 

Proceedings of the 5th international conference on Intelligent user interfaces January 2000 
Our central claim is that user interactions with everyday productivity applications (e.g., word 
processors, Web browsers, etc.) provide rich contextual information that can be leveraged to support 
just-in-time access to task-relevant information. We discuss the requirements for such systems, and 
develop a general architecture for systems of this type. As evidence for our claim, we present 
Watson, a system which gathers contextual information in the form of the text of the document the 
user ... 



29Tlie FINITE STRING Newsletter: Abstracts of current literature 
Computational Linguistics Staff 
Computational Linguistics January 1987 
Volume 13 Issue 1-2 

30 The FINITE STRING newsletter: Abstracts of current literature 

Computational Linguistics Staff 
Computational Linguistics April 1986 
Volume 12 Issue 2 



31 Challenges in information retrieval and language nnocleiing: report of a worksiiop 
held at the center for intelligent information retrieval, University of Massachusetts 
Amherst, September 2002 

James Allan , Jay Aslam , Nicholas Belkin , Chris Buckley , Jamie Callan , Bruce Croft , Sue Dumais , 
Norbert Fuhr , Donna Harman , David J. Harper , Djoerd Hiemstra , Thomas Hofmann , Eduard Hovy , 
Wessel Kraaij , John Lafferty , Victor Lavrenko , David Lewis , Liz Liddy , R. Manmatha , Andrew 
McCallum , Jay Ponte , John Prager , Dragomir Radev , Philip Resnik , Stephen Robertson , Ron! 
Rosenfeld , Salim Roukos , Mark Sanderson , Rich Schwartz , Amit Singhal , Alan Smeaton , Howard 
Turtle , Ellen Voorhees , Ralph Weischedel , Jinxi Xu , ChengXiang Zhai 
ACM SIGIR Forum April 2003 
Volume 37 Issue 1 

32 Description and Analysis: CliangeDetector^^: a site-level monitoring tool for the 
3 WWW 

Vijay Boyapati , Kristie Chevrier , Avi Finkel , Natalie Glance , Tom Pierce , Robert Stockton , Chip 
Whitmer 

Proceedings of the eleventh international conference on World Wide Web May 2002 

This paper presents a new challenge for Web monitoring tools: to build a system that can monitor 
entire web sites effectively. Such a system could potentially be used to discover "silent news" hidden 
within corporate web sites. Examples of silent news include reorganizations in the executive team of 
a company or in the retirement of a product line. ChangeDetector, an implemented prototype, 
addresses this challenge by incorporating a number of machine learning techniques. The principal 
backend co ... 

33 The proposed new Computing Reviews classification scheme 

Anthony Ralston 

Communications of the ACM July 1981 
Volume 24 Issue 7 

34 The new (1982) Computing Reviews classification system— final version 

Jean E. Sammet , Anthony Ralston 
— ' Communications of the ACM January 1982 
Volume 25 Issue 1 

35 A learning agent for wireless news access 

a Daniel Billsus , Michael J. PazzanI , James Chen 
Proceedings of the 5tii international conference on Intelligent user interfaces January 2000 
We describe a user interface for wireless information devices, specifically designed to facilitate 
learning about users' individual Interests in daily news stories. User feedback is collected 
unobtrusively to form the basis for a content-based machine learning algorithm. As a result, the 
described system can adapt to users' individual interests, reduce the amount of information that 
needs to be transmitted, and help users access relevant information with minimal effort. 

36 Detection of shifts in user interests for personalized information filtering 

W. Lam , S. Mukhopadhyay , J. Mostafa , M. Palakal 

Proceedings of the 19th annual international ACi^ SIGIR conference on Research and 
development in information retrieval August 1996 

37 Mining scientific data 

Usama Fayyad , David Haussler , Paul Stolorz 
^ Communications of the ACM November 1996 
Volume 39 Issue 11 



38 A multiparadigmatic environment for interacting with databases 

T. Catarci , M. F. Costabile , A. Massari , L. Saladini , G. Santucci 
- ACM SIGCHI Bulletin July 1996 
Volume 28 Issue 3 

We present a prototype system to be used for visually accessing heterogeneous databases. The basic 
idea is to provide the user with several visual representations of data as well as multiple interaction 
mechanisms for both querying databases and visualizing the query results. Since some visual 
representations better fit certain user classes, the system adapts to the user's needs by switching to 
the most appropriate visual representation and interaction mechanism, according to a suitable user 
mod ... 



39 Pen computing: a technology overview and a vision 

Andre Meyer 

ACM SIGCHI Bulletin July 1995 
Volume 27 Issue 3 

This work gives an overview of a new technology that is attracting growing interest in public as well 
as in the computer industry itself. The visible difference from other technologies is in the use of a pen 
or pencil as the primary means of interaction between a user and a machine, picking up the familiar 
pen and paper interface metaphor. From this follows a set of consequences that will be analyzed and 
put into context with other emerging technologies and visions. Starting with a short historic ... 



40 Automated cataloging and analysis of sky survey image databases: the SKICAT 
1^ system 

Usama M. Fayyad , Nicholas Weir , S. Djorgovski 

Proceedings of the second international conference on Information and knowledge 
management December 19'93 
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1 Survey articles: Data mining for hypertext: a tutorial survey 

Soumen Chakrabarti 

ACM SIGKDD Explorations Newsletter January 2000 
Volume 1 Issue 2 

With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile 
ground for data mining research to make a difference to the effectiveness of information search. 
Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and 
searching via keyword queries. This process is often tentative and unsatisfactory. Better support is 
needed for expressing one's information need and dealing with a search result in more structured 
ways than av ... 

2 Selective sampling for example-based word sense disambiguation 

Atsushi Fujii , Takenobu Tokunaga , Kentaro Inui , Hozumi Tanaka 
Computational Linguistics December 1998 
Volume 24 Issue 4 

This paper proposes an efficient example sampling method for example-based word sense 
disambiguation systems. To construct a database of practical size, a considerable overhead for 
manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity 
of searching a large-sized database poses a considerable problem (overhead for search). To counter 
these problems, our method selectively samples a smaller-sized effective subset from a given 
example set for use in wor ... 



3 Machine learning in automated text categorization 
□h Fabrizio Sebastian! 

^ ACM Computing Surveys (CSUR) March 2002 
Volume 34 Issue 1 

The automated categorization (or classification) of texts into predefined categories has witnessed a 
booming interest in the last 10 years, due to the increased availability of documents in digital form 
and the ensuing need to organize them. In the research community the dominant approach to this 
problem is based on machine learning techniques: a general inductive process automatically builds a 



classifier by learning, from a set of preclassified documents, the characteristics of the categories. 



4 Learning classifiers: Liveclassifier: creating hierarchical text classifiers through web 82% 
corpora 

Chien-Chung Huang , Shui-Lung Chuang , Lee-Feng Chlen 

Proceedings of the 13th conference on World Wide Web May 2004 

Many Web information services utilize techniques of information extraction(IE) to collect important 
facts from the Web. To create more advanced services, one possible method is to discover thematic 
information from the collected facts through text classification. However, most conventional text 
classification techniques rely on manual-labelled corpora and are thus ill-suited to cooperate with 
Web information services with open domains. In this work, we present a system named LiveClassifier 
that... 



5 Data clustering: a review 
A. K. Jain , M. N. Murty , P. J. Flynn 
ACM Computing Surveys (CSUR) September 1999 
Volume 31 Issue 3 

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) 
Into groups (clusters). The clustering problem has been addressed in many contexts and by 
researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in 
exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences 
in assumptions and contexts in different communities has made the transfer of useful generic co ... 



6 Exploration of text collections with hierarchical feature nnaps 
Dieter MerkI 

ACM SIGIR Forum , Proceedings of the 20th annual international ACM SIGIR conference on 
Research and development in information retrieval July 1997 
Volume 31 Issue SI 



7 Special issue on word sense disambiguation: Disambiguating highly ambiguous 

1^ words 

Geoffrey Towell , Ellen M. Voorhees 
Computational Linguistics March 1998 
Volume 24 Issue 1 

A word sense disambiguator that is able to distinguish among the many senses of common words 
that are found in general-purpose, broad-coverage lexicons would be useful. For example, 
experiments have shown that, given accurate sense disambiguation, the lexical relations encoded in 
lexicons such as WordNet can be exploited to improve the effectiveness of information retrieval 
systems. This paper describes a classifier whose accuracy may be sufficient for such a purpose. The 
classifier combines the ... 



8 Special issue on special feature: Sufficient dimensionality reduction 
pft Amir Globerson , Naftali Tishby 

^ The Journal of Machine Learning Research March 2003 
Volume 3 

Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised 
learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. 
One principled approach to this problem is to represent the data in low dimension with minimal loss 
of (mutual) information contained in the original data. In this paper we introduce an information 
theoretic nonlinear method for finding such a most informative dimension reduction. In contrast wi ... 



9 Enhanced hypertext categorization using hyperlinl<s 



80% 



Soumen Chakrabarti , Byron Dom , Piotr Indyk 

ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international conference on 
Management of data June 1998 
Volume 27 Issue 2 

A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data 
that enables structured search using topic taxonomies, circumvents keyword ambiguity, and 
improves the quality of search and profile-based routing and filtering. Therefore, an accurate 
classifier is an essential component of a hypertext database. Hyperlinks pose new problems not 
addressed in the extensive text classification literature. Links clearly contain high-quality semantic 
clues that ... 



10 Automated techniques for managing collections: Machine leaming for information 

U architecture in a large governmental website 

Miles Efron , Jonathan Elsas , Gary Marchionini , Junliang Zhang 

Proceedings of the 2004 joint ACi^/IEEE conference on Digital libraries June 2004 

This paper describes ongoing research into the application of nnachine learning techniques for 
improving access to governmental information in complex digital libraries. Under the auspices of the 
GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately 
spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a 
practical aid for information architects. Second, automatically derived document-concept relations ... 



11 Word sense disambiguation of adjectives using probabilistic networks 

Gerald Chao , Michael G. Dyer 

Proceedings of tiie 17th conference on Computational linguistics - Volume 1 July 2000 

In this paper, word sense disambiguation (WSD) accuracy achievable by a probabilistic classifier, 
using very minimal training sets, is investigated. We made the assumption that there are no tagged 
corpora available and identified what information, needed by an accurate WSD system, can and 
cannot be automatically obtained. The lesson learned can then be used to focus on what knowledge 
needs manual annotation. Our system, named Bayesian Hierarchical Disambiguator (BHD), uses the 
Internet, a ... 



12 Image retrieval: A bootstrapping approach to annotating large Innage collection 
HuaMin Feng , Tat-Seng Chua 

— ^ Proceedings of the 5th ACM SIGMM international workshop on Multimedia information 
retrieval November 2003 

Huge amount of manual efforts are required to annotate large image/video archives with text 
annotations. Several recent works attempted to automate this task by employing supervised learning 
approaches to associate visual information extracted in segmented images with semantic concepts 
provided by associated text. The main limitation of such approaches, however, is that large labeled 
training corpus is still needed for effective learning, and semantically meaningful segmentation for 
images is in ... 



13 Special issue on word sense disambiguation: Topical clustering of MRD senses based 77% 

on infornnation retrieval techniques 
Jen Nan Chen , Jason S. Chang 
Computational Linguistics March 1998 
Volume 24 Issue 1 

This paper describes a heuristic approach capable of automatically clustering senses in a machine- 
readable dictionary (I^RD). Including these clusters in the MRD-based lexical database offers several 
positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser 
sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the 
MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, If 
t ... 



14 Special issue on word sense disambiguation: Introduction to the special issue on 
word sense disannbiguation: the state of the art 
Nancy Ide , Jean Veronis 
Computational Linguistics March 1998 
Volume 24 Issue 1 



15 Video Retrieval and Browsing: Comparing discriminating transformations and SVM 77% 
01 for learning during multimedia retrieval 
Xiang Sean Zhou , Thomas S. Huang 

Proceedings of tiie ninth ACM international conference on Multimedia October 2001 

On-line learning or "relevance feedback" techniques for multimedia information retrieval have been 
explored from many different points of view: from early heuristic-based feature weighting schemes to 
recently proposed optimal learning algorithms, probabilistic/Bayesian learning algorithms, boosting 
techniques, discriminant-EM algorithm, support vector machine, and other Icernel-based learning 
machines. Based on a careful examination of the problem and a detailed analysis of the existing 
solutions ... 



16 Adaptive information filtering; detecting changes in text streams 

Carsten Lanquillon , Ingrid Renz 

Proceedings of the eighth international conference on Information and knowledge 
management November 1999 

The task of information filtering is to classify documents from a stream as either relevant or non- 
relevant according to a particular user interest with the objective to reduce information load. When 
using an information filter in an environment that is changing with time, methods for adapting the 
filter should be considered in order to retain classification accuracy. We favor a methodology that 
attempts to detect changes and adapts the information filter only if inevitable in order to mini ... 



17 Content-based book recommending using learning for text categorization 

Raymond J. Mooney , Loriene Roy 

Proceedings of the fifth ACM conference on Digital libraries June 2000 

Recommender systems improve access to relevant products and information by making personalized 
suggestions based on previous examples of a user's likes and dislikes. Most existing recommender 
systems use collaborative filtering methods that base recommendations on other users' preferences. 
By contrast,content-based methods use information about an item itself to make suggestions.Thls 
approach has the advantage of being able to recommend previously unrated items to users with 
unique interes ... 



18 Learnable visual keywords for innage classification 

Joo-Hwee Lim 

Proceedings of the fourth ACM conference on Digital libraries August 1999 



19 Web mining research: a survey 

Raymond Kosala , Hendrik Blockeel 
ACM SIGKDD Explorations Newsletter June 2000 
Volume 2 Issue 1 



20 Knowledge management session 4: indexing: Bootstrapping for liierarchical 

1^ document classification 

Giordano Adam! , Paolo AvesanI , Diego Bona 

Proceedings of the twelfth international conference on Information and knowledge 
management November 2003 

Managing the hierarchical organization of data is starting to play a key role In the knowledge 



management community due to the great amount of human resources needed to create and maintain 
these organized repositories of information. {Machine learning community has in part addressed this 
problem by developing hierarchical supervised classifiers that help maintainers to categorize new 
resources within given hierarchies. Although such learning models succeed in exploiting relational 
knowledge, they ... 
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21 Challenges in information retrieval and language modeling: report of a workshop 
12 held at the center for intelligent information retrieval. University of Massachusetts 
Amherst, September 2002 

James Allan , Jay Aslam , Nicholas Belkin , Chris Buckley , Jamie Callan , Bruce Croft , Sue Dumais , 
Norbert Fuhr , Donna Harman , David J. Harper , Djoerd Hiemstra , Thomas Hofmann , Eduard Hovy , 
Wessel Kraaij , John Lafferty , Victor Lavrenko , David Lewis , Liz Liddy , R. Manmatha , Andrew 
McCallum , Jay Ponte , John Prager , Dragomir Radev , Philip Resnik , Stephen Robertson , Roni 
Rosenfeld , Salim Roukos , Mark Sanderson , Rich Schwartz , Amit Singhal , Alan Smeaton , Howard 
Turtle , Ellen Voorhees , Ralph Welschedel , Jinxi Xu , ChengXiang Zhai 
ACM SIGIR Forum April 2003 
Volume 37 Issue 1 



22 Special Issue on Machine learning methods for text and images: Matching words and 77% 
pictures 

Kobus Barnard , Pinar Duygulu , David Forsyth , Nando de Freitas , David M. Blei , Michael I. Jordan 
The Journal of Machine Learning Research March 2003 
Volume 3 

We present a new approach for modeling multi-modal data sets, focusing on the specific case of 
segmented images with associated text. Learning the joint distribution of image regions and words 
has many applications. We consider in detail predicting words associated with whole images (auto- 
annotation) and corresponding to particular image regions (region naming). Auto-annotation might 
help organize and access large collections of images. Region naming is a model of object recognition 
as a process ... 



23 Learning with mixtures of trees 

Marina Meila , Michael I. Jordan 
^ The Journal of Machine Learning Research September 2001 

Volume 1 

This paper describes the mixtures-of- trees model, a probabilistic model for discrete multidimensional 



domains. Mixtures-of-trees generalize the probabilistic trees of Chow and Liu (1968) in a different 
and complementary direction to that of Bayesian networks. We present efficient algorithms for 
learning mixtures-of-trees models in maximum likelihood and Bayesian frameworks. We also discuss 
additional efficiencies that can be obtained when data are "sparse," and we present data structures 
and alg ... 



24 A comparative study for domain ontology guided feature extraction 

□h Bill B. Wang , R. I. Bob Mckay , Hussein A. Abbass , Michael Barlow 

^ Proceedings of the twenty-sixth Australasian computer science conference on Conference in 
research and practice in information technology - Volume 16 February 2003 

We introduced a novel method employing a hierarchical domain ontology structure to extract features 
representing documents in our previous publication (Wang 2002). All raw words in the training 
documents are mapped to concepts in a concept hierarchy derived from the domain ontology. Based 
on these concepts, a concept hierarchy is established for the training document space, using is-a 
relationships defined in the domain ontology. An optimum concept set may be obtained by searching 
the concept hi ... 



25 Scalable feature selection, classification and signature generation for organizing 
large text databases Into liierarchlcal topic taxonomies 

Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan 

The VLDB Journal — The International Journal on Very Large Data Bases August 1998 

Volume 7 Issue 3 

We explore how to organize large text databases hierarchically by topic to aid better searching, 
browsing and filtering. Many corpora, such as internet directories, digital libraries, and patent 
databases are manually organized into topic hierarchies, also called taxonomies. Similar to indices for 
relational data, taxonomies make search and access more efficient. However, the exponential growth 
in the volume of on-line textual information makes it nearly impossible to maintain such taxono ... 



26 Sumnnarization: The use of unlabeled data to improve supervised learning for text 77% 

1^ summarization 

Massih-Reza Amini , Patrick Gallinari 

Proceedings of the 25th annual international ACM SIGIR conference on Research and 

development in information retrieval August 2002 

With the huge amount of information available electronically, there is an increasing demand for 
automatic text summarization systems. The use of machine learning techniques for this task allows 
one to adapt summaries to the user needs and to the corpus characteristics. These desirable 
properties have motivated an increasing amount of work in this field over the last few years. Most 
approaches attempt to generate summaries by extracting sentence segments and adopt the 
supervised learning paradigm ... 



27 Fast supervised dimensionality reduction algorithm witli applications to document 
categorization & retrieval 

George Karypis , Eui-Hong (Sam) Han 

Proceedings of the ninth international conference on Information and knowledge management 

November 2000 



28 Hypertext data mining (tutorial AM-1) 

Soumen Chakrabarti 

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and 
data mining August 2000 



29 Classification and regression: money *can* grow on trees 



77% 



Johannes Gehrke , Wie-Yin Loh , Raghu Ramakrishnan 

Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and 

data mining August 1999 

With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile 
ground for data mining research to make a difference to the effectiveness of information search. 
Today, Web surfers access the Web through two dominant interfaces clicking on hyperlinks and 
searching via keyword queries This process is often tentative and unsatisfactory Better support Is 
needed for expressing one's information need and dealing with a search result in more structured 
ways than ... 



30 An evaluation of phrasal and clustered representations on a text categorization task 77% 

David D. Lewis 

Proceedings of the 15th annual international ACM SIGIR conference on Research and 
development in information retrieval June 1992 

Syntactic phrase indexing and term clustering have been widely explored as text representation 
techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing 
languages on a text categorization task, enabling us to study their properties in isolation from query 
interpretation issues. We show that optimal effectiveness occurs when using only a small proportion 
of the indexing terms available, and that effectiveness peaks at a higher feature set size and ... 
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Survey articles: Data mining for hypertext: a tutorial survey 87% 
Sou men Chakrabarti 

ACM SIGKDD Explorations Newsletter January 2000 
Volume 1 Issue 2 

With over 800 million pages covering most areas of human endeavor, the World-wide Web Is a fertile 
ground for data mining research to make a difference to the effectiveness of Information search. 
Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and 
searching via keyword queries. This process is often tentative and unsatisfactory. Better support is 
needed for expressing one's information need and dealing with a search result in more structured 
ways than av ... 

Machine learning in automated text categorization 82% 

Fabrizio Sebastlani 

ACM Computing Surveys (CSUR) March 2002 
Volume 34 Issue 1 

The automated categorization (or classification) of texts into predefined categories has witnessed a 
booming interest in the last 10 years, due to the increased availability of documents in digital form 
and the ensuing need to organize them. In the research community the dominant approach to this 
problem is based on machine learning techniques: a general inductive process automatically builds a 
classifier by learning, from a set of preclassified documents, the characteristics of the categories. ... 

Data clustering: a review 82% 

A. K. Jain , M. N. Murty , P. J. Flynn 

ACM Computing Surveys (CSUR) September 1999 

Volume 31 Issue 3 

Clustering Is the unsupervised classification of patterns (observations, data items, or feature vectors) 
into groups (clusters). The clustering problem has been addressed in many contexts and by 
researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in 
exploratory data analysis. However, clustering is a difficult problem combinatorlally, and differences 
in assumptions and contexts in different communities has made the transfer of useful generic co ... 



4 Special issue on word sense disambiguation: Disambiguating highly ambiguous 

12 words 

Geoffrey Towel I , Ellen M. Voorhees 
Computational Linguistics March 1998 

Volume 24 Issue 1 r ^ 

A word sense disambiguator that is able to distinguish among the many senses of common words 
that are found In general-purpose, broad-coverage lexicons would be useful. For example, 
experiments have shown that, given accurate sense disambiguation, the lexical relations encoded in 
lexicons such as WordNet can be exploited to improve the effectiveness of information retrieval 
systems. This paper describes a classifier whose accuracy may be sufficient for such a purpose. The 
classifier combines the ... 



Special issue on special feature: Sufficient dimensionality reduction 80% 

Amir Globerson , Naftali Tishby 

The Journal of Machine Learning Research March 2003 
Volume 3 

Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised 
learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. 
One principled approach to this problem is to represent the data in low dimension with minimal loss 
of (mutual) information contained in the original data. In this paper we introduce an information 
theoretic nonlinear method for finding such a most informative dimension reduction. In contrast wi ... 



6 Enhanced hypertext categorization using hyperlinks 
Soumen Chakrabarti , Byron Dom , Piotr Indyk 

ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international conference on 
Management of data June 1998 
Volume 27 Issue 2 

A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data 
that enables structured search using topic taxonomies, circumvents keyword ambiguity, and 
improves the quality of search and profile-based routing and filtering. Therefore, an accurate 
classifier is an essential component of a hypertext database. Hyperlinks pose new problems not 
addressed in the extensive text classification literature. Links clearly contain high-quality semantic 
clues that ... 



7 Selective sampling for exannple-based word sense disambiguation 

Atsushi Fujii , Takenobu Tokunaga , Kentaro Inui , Hozumi Tanaka 
Computational Linguistics December 1998 
Volume 24 Issue 4 

This paper proposes an efficient example sampling method for example-based word sense 
disambiguation systems. To construct a database of practical size, a considerable overhead for 
manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity 
of searching a large-sized database poses a considerable problem (overhead for search). To counter 
these problems, our method selectively samples a smaller-sized effective subset from a given 
example set for use in wor ... 



8 Exploration of text collections with hierarchical feature maps 

Dieter Merkl 

ACM SIGIR Forum , Proceedings of the 20th annual international ACM SIGIR conference on 
Research and development In information retrieval July 1997 
Volume 31 Issue SI 



Word sense disambiguation of adjectives using probabilistic networks 



77% 



□h Gerald Chao , Michael G. Dyer 

^ Proceedings of the 17th conference on Computational linguistics - Volume 1 July 2000 

In this paper, word sense disannbiguation (WSD) accuracy achievable by a probabilistic classifier, 
using very minimal training sets, is Investigated. We made the assumption that there are no tagged 
corpora available and identified what information, needed by an accurate WSD system, can and 
cannot be automatically obtained. The lesson learned can then be used to focus on what knowledge 
needs manual annotation. Our system, named Bayesian Hierarchical Disambiguator (BHD), uses the 
Internet, a ... 

10 Special issue on word sense disambiguation: Topical clustering of MRD senses based 
Q on information retrieval techniques 

Jen Nan Chen , Jason S. Chang 

Computational Linguistics March 1998 

Volume 24 Issue 1 

This paper describes a heuristic approach capable of automatically clustering senses in a machine- 
readable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several 
positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser 
sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the 
MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, If 
t ... 



11 Special issue on word sense disambiguation: Introduction to the special issue on 

12 word sense disambiguation: the state of the art 
Nancy Ide , Jean Veronis 

Computational Linguistics March 1998 
Volume 24 Issue 1 



12 Video Retrieval and Browsing: Comparing discriminating transformations and SVM 
12 for learning during multimedia retrieval 

Xiang Sean Zhou , Thomas S. Huang 

Proceedings of the ninth ACM international conference on Multimedia October 2001 

On-line learning or "relevance feedback" techniques for multimedia information retrieval have been 
explored from many different points of view: from early heuristic-based feature weighting schemes to 
recently proposed optimal learning algorithms, probabilistic/Bayesian learning algorithms, boosting 
techniques, discriminant-EM algorithm, support vector machine, and other kernel-based learning 
machines. Based on a careful examination of the problem and a detailed analysis of the existing 
solutions ... 



13 Adaptive information filtering: detecting changes in text streams 

□h Carsten Lanquillon , Ingrid Renz 

^ Proceedings of the eighth international conference on Information and knowledge 
management November 1999 

The task of information filtering is to classify documents from a stream as either relevant or non- 
relevant according to a particular user interest with the objective to reduce information load. When 
using an information filter in an environment that is changing with time, methods for adapting the 
filter should be considered in order to retain classification accuracy. We favor a methodology that 
attempts to detect changes and adapts the information filter only if inevitable in order to mini ... 



14 Content-based book recommending using learning for text categorization 

Raymond J, Mooney , Loriene Roy 

Proceedings of the fifth ACM conference on Digital libraries June 2000 

Recommender systems improve access to relevant products and information by making personalized 
suggestions based on previous examples of a user's likes and dislikes. Most existing recommender 



systems use collaborative filtering methods that base recommendations on other users' preferences. 
By contrast,content-based methods use information about an item itself to make suggestions.This 
approach has the advantage of being able to recommend previously unrated items to users with 
unique interes ... 



15 Challenges in infornnation retrieval and language modeling: report of a workshop 
13 held at the center for intelligent information retrieval, University of Massachusetts 
Amherst, September 2002 

James Allan , Jay Aslam , Nicholas Belkin , Chris Buckley , Jamie Callan , Bruce Croft , Sue Dumais , 
Norbert Fuhr , Donna Harman , David J. Harper , Djoerd Hiemstra , Thomas Hofmann , Eduard Hovy , 
Wessel Kraaij , John Lafferty , Victor Lavrenko , David Lewis , Liz Liddy , R. Manmatha , Andrew 
McCallum , Jay Ponte , John Prager , Dragomir Radev , Philip Resnik , Stephen Robertson , Roni 
Rosenfeld , Salim Roukos , Mark Sanderson , Rich Schwartz , Amit Singhai , Alan Smeaton , Howard 
Turtle , Ellen Voorhees , Ralph Weischedel , Jinxi Xu , ChengXiang Zhai 
ACM SIGIR Forum April 2003 
Volume 37 Issue 1 



16 Special issue on Machine learning methods for text and innages: Matching words and 
pictures 

Kobus Barnard , Pinar Duygulu , David Forsyth , Nando de Freitas , David M. Blei , Michael I. Jordan 
The Journal of Machine Learning Research March 2003 
Volume 3 

We present a new approach for modeling multi-modal data sets, focusing on the specific case of 
segmented images with associated text. Learning the joint distribution of image regions and words 
has many applications. We consider in detail predicting words associated with whole images (auto- 
annotation) and corresponding to particular image regions (region naming). Auto-annotation might 
help organize and access large collections of images. Region naming is a model of object recognition 
as a process ... 



17 Learning with nnixtures of trees 

Marina Meila , Michael 1. Jordan 

The Journal of Machine Learning Research September 2001 
Volume 1 

This paper describes the mixtures-of-trees model, a probabilistic model for discrete multidimensional 
domains. Mixtures-of-trees generalize the probabilistic trees of Chow and Liu (1968) in a different 
and complementary direction to that of Bayesian networks. We present efficient algorithms for 
learning mixtures-of-trees models in maximum likelihood and Bayesian frameworks. We also discuss 
additional efficiencies that can be obtained when data are "sparse," and we present data structures 
and alg ... 



18 Scalable feature selection, classification and signature generation for organizing 

12 large text databases into hierarchical topic taxonomies 

Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan 

The VLDB Journal — The International Journal on Very Large Data Bases August 1998 

Volume 7 Issue 3 

We explore how to organize large text databases hierarchically by topic to aid better searching, 
browsing and filtering. Many corpora, such as internet directories, digital libraries, and patent 
databases are manually organized into topic hierarchies, also called taxonomies. Similar to indices for 
relational data, taxonomies make search and access more efficient. However, the exponential growth 
in the volume of on-line textual information makes it nearly impossible to maintain such taxono ... 



19 Summarization: The use of unlabeled data to improve supervised learning for text 77% 
summarization 



Massih-Reza Amini , Patrick Gallinari 

Proceedings of the 25th annual international ACM SIGIR conference on Research and 

development in information retrieval August 2002 

With the huge amount of information available electronically, there is an increasing demand for 
automatic text summarization systems. The use of machine learning techniques for this task allows 
one to adapt summaries to the user needs and to the corpus characteristics. These desirable 
properties have motivated an increasing amount of work in this field over the last few years. Most 
approaches attempt to generate summaries by extracting sentence segments and adopt the 
supervised learning paradigm ... 

20 Fast supervised dimensionality reduction algorithm with applications to document 
categorization & retrieval 

George Karypis , Eui-Hong (Sam) Han 

Proceedings of the ninth international conference on Information and knowledge management 

November 2000 
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21 Hypertext data mining (tutorial AM-1) 
Sounnen Chakrabarti 

Tutorial notes of the sixth ACM SIGKDD International conference on Knowledge discovery and 
data mining August 2000 
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1 Performance analysis of Godard-based blind channel identification 

Schniter, P.; 

Adaptive Systems for Signal Processing, Communications, and Control Symposium 
2000. AS-SPCC. The IEEE 2000 , 1-4 Oct. 2000 
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2 Blind estimation of multiple co-channel digital signals in vector FIR 
channels 
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1 Recognizing user context via wearable sensors 

Clarkson, B./ Mass, K.; Pentland, A.; 

Wearable Computers, 2000. The Fourth International Symposium on , 16-17 Oct. 
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